subsample 0
7fd3b80fb1884e2927df46a7139bb8bf-Supplemental.pdf
The IDs of the 10 datasets used in this work, as well as the number of examples and features, are provided in Table 1 in the main manuscript. All of the datasets correspond to binary classification problems, with varying degrees of class imbalance. While the prediction is always performed in the logarithmic domain, when evaluating the models we transform both the labels and the model predictions back into their original domain. The loss function used for training and evaluation is the standard root mean-squared error (sklearn.metrics.mean_squared_error). We download the raw data programmatically using the Kaggle API, which produces the filetrain.tsv.